logo logo European Journal of Educational Research

EU-JER is is a, peer reviewed, online academic research journal.

Subscribe to

Receive Email Alerts

for special events, calls for papers, and professional development opportunities.

Subscribe

Publisher (HQ)

Eurasian Society of Educational Research
Eurasian Society of Educational Research
7321 Parkway Drive South, Hanover, MD 21076, USA
Eurasian Society of Educational Research
Headquarters
7321 Parkway Drive South, Hanover, MD 21076, USA

'classical test theory' Search Results



The Development of an Instrument to Measure the Higher Order Thinking Skill in Physics

higher order thinking skill physics instrument

Syahrul Ramadhan , Djemari Mardapi , Zuhdan Kun Prasetyo , Heru Budi Utomo


...

This study is conducted to develop the diagnostic test, which can be used to measure the higher-order thinking skill (HOTs) of students of first-grade senior high school in Bima district, West Nusa Tenggara. The step of developing instruments such as test which using modification model of Oreondo which include two activities such as test designing and test trials. The analysing technique of validity of content used Aiken formula, classical test theory used software Iteman 4.3, the model of Rasch used software Winstep and analysing reliability used software SPSS. The conclusion which can be taken are developing instrument has the characteristics as a useful instrument and fulfil requirement used to measure. This case proved from the data of analysis result which confirm that the instrument has been achieved the content of validity by expert judgment and obtained the empirical evidence, both as classical test theory or Rasch model.

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.8.3.743
Pages: 743-751
cloud_download 1008
visibility 939
18
Article Metrics
Views
1008
Download
939
Citations
Crossref
18

Scopus
36

...

The purposes of this research are: 1) to compare two equalizing tests conducted with Hebara and Stocking Lord method; 2) to describe the characteristics of each equalizing test method using windows’ IRTEQ program. This research employs a participatory approach as the data are collected through questionnaires based on the National Examination Administration of 2018. The samples are classified into group A and group B respectively by 449 and 502 respondents. This paper discusses how to equalize shared items using the anchor method with a set of instruments in the forms of 35 questionnaire items and 6 shared items. In addition, the researcher also uses PARSCALE to estimate each respondent’s skills and each item’s characteristics. The shared items are eventually equalized using IRTEQ program. The results show that there is a significant difference between those conducted using Haebara method (0.592) which produces bigger mean-sigma value and Stocking & Lord (0.00213). Thus, the results show that the shared testing items may improve respondents’ discrimination and increase the difficulty level (parameter b). Due to the availability of shared items, it is good and appropriate to equalize two different tests on different theta skills.

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.8.4.1071
Pages: 1071-1079
cloud_download 530
visibility 626
3
Article Metrics
Views
530
Download
626
Citations
Crossref
3

Scopus
2

Psychometric Assessment and Cross-Cultural Adaptation of the Grit-S Scale among Omani and American Universities’ Students

grit psychometric properties achievement goal orientations cross-cultural study

Amal Alhadabi , Said Aldhafri , Hussain Alkharusi , Ibrahim Al-Harthy , Hafidha AlBarashdi , Marwa Alrajhi


...

The current study aimed to adopt and assess the psychometric properties and measurement invariance of Grit-S among Omani and American students (N = 487) using Exploratory Factor Analysis (EFA) and Multi-Group Confirmatory Factor Analysis (CFA). The scale’s construct validity was estimated by investigating its associations with achievement goal orientations (AGOs). EFA results suggested that a two-factor solution (i.e., perseverance of effort [G_PE] and consistency of interest [G_CI]) was the best factorial structure, explaining 47.74% and 51.02% of the variance in the Omani and American samples, respectively. The factors had good reliability coefficients in the two samples. Related to the intercultural differences, G_PE explained more variance among Omanis (31.02%) relative to American sample, whereas G_CI explained a larger proportion of variance among Americans (36.86%) compared with Omani sample. The first level of measurement invariance, configural invariance, was not supported, necessitating the investigation of the other levels of measurement invariance using a new sample. Grit correlated positively with mastery and performance-approach goals (r = .29 and .12, respectively) and negatively with avoidance goals (r = -.25), supporting the scale’s construct validity. These findings showed that Grit-S scale can be used as valid and reliable assessment tool to assess student interest and perseverance in the academic context in Arabic/Omani and American cultures.

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.8.4.1175
Pages: 1175-1191
cloud_download 330
visibility 565
6
Article Metrics
Views
330
Download
565
Citations
Crossref
6

Scopus
6

...

Social media (SM) use is a rapidly growing phenomenon among Millennials. Thus, a growing body of studies have explored the beneficial applications and negative consequences of their use in an increasingly virtual world. The current study aimed to develop and validate a scale that measures university students’ motives for using SM from a psychological and social perspective. In Study 1 (N = 316), the psychometric properties of SM motives were examined. The estimated factorial structure was validated in Study 2 (N = 200). The Study 1 results showed two active personal motives scales (i.e., self-actualization and purposive motives), one passive motive scale (i.e., enjoyment), one active contextual motive scale (i.e., self-enhancement), and a contextual (neither active nor passive) motive scale (i.e., a factor of convenience). Study 2 findings confirmed this factorial structure. Construct validity was supported with significant differences between three types of users (i.e., productive, consuming, and disinterested) on their motives (151 words).

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.9.2.835
Pages: 835-851
cloud_download 776
visibility 741
4
Article Metrics
Views
776
Download
741
Citations
Crossref
4

Scopus
4

...

This study aims to design mathematical literacy instruments that have evidence of content and construct validity and are reliable for use as an Assessment for Learning. The research involved eight experts as instrument validators and 273 eighth-grade students of junior high school in Yogyakarta Province. The results showed that the ten mathematical literacy items developed had the V Aiken coefficient index calculated from 0.781 to 0.906 (> 0.75). The results of adequacy testing of samples with KMO and Bartlett show Chi-Square in the Bartlett test of 608,608, the p-value <0.05 and KMO value of 0.781 (> 0.5). The results of testing of the measurement model with Confirmatory Factor Analysis (CFA) produce a Root Mean Square Error of Approach (RMSEA) value of 0.049 (≤ 0.08), chi-s Square of 33.92 (<2df), the p-value of 0.05004 (≥ 0.05). Nine out of the ten items developed had t-value> 1.96, Standardized Loading Factor (SLF) was greater than the critical limit (> 0.3), and Construct Reliability (CR) of 0.78 (> 0.7). It can be concluded that the developed mathematical literacy instrument can measure what must be measured and nine items significantly reflect the construct or latent variable, as well as the level of consistency of a good score.

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.9.2.865
Pages: 865-875
cloud_download 1061
visibility 1046
9
Article Metrics
Views
1061
Download
1046
Citations
Crossref
9

Scopus
12

...

New Inquiry-Based Learning (NIBL) was developed to improve students’ multiple higher-order thinking skills (MHOTS), such as thinking critically, analytically, creatively, and practically (CACP). This study aimed to examine the increase of students’ MHOTS ability, their perceptions of the NIBL model, and the contribution of the NIBL model to the learning outcomes.  A quasi-experiment of the nonequivalent control group design was implemented in this study. Research subjects were university students majoring in chemistry education and enrolling in the Organic Chemistry course. The experiment and the control groups consisted of 34 and 32 students, respectively. The collected data were analyzed by using t-test and ANCOVA procedures. N-Gain scores were calculated to measure the differences in the increase in learning outcomes. Eta square values measured the contribution of NIBL. The results of this study revealed that there were differences in the learning outcomes of the experiment and control group. The CACP thinking skills and the mastery of organic chemistry concepts of the experiment group increased significantly. The N-Gain scores of practical thinking skills aspect were on medium category, and for critical, analytical, and creative thinking, as well as for mastery of organic chemistry concepts were on high categories. For the control group, the N-Gain scores of all categories were on low or medium categories. The NIBL model effectively improved the prospective chemistry teachers’ M-HOTS in terms of CACP thinking skills and contributed significantly to the increase in the students’ mastery of organic chemistry concepts.

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.9.3.1309
Pages: 1309-1325
cloud_download 1211
visibility 844
3
Article Metrics
Views
1211
Download
844
Citations
Crossref
3

Scopus
5

...

The current study investigated Student-Teacher Relationship Measure (STRM) psychometric properties using Rasch analysis in a sample of middle school female students (N = 995). Rasch Principal Components Analysis revealed psychometric support of two subscales (i.e., Academic and Social Relations). Summary statistics showed good psychometric properties. The category structure and individual statistics (i.e., items and person infit and outfit) were not ideal. Category structure showed that the distances between adjacent thresholds were lower than optimal criteria. Even though findings indicated that items mean square statistics (MNSQ) were optimal, standardized fit statistics (i.e., ZSTD) reflected many misfit persons and items in each subscale. After eliminating the misfit persons and items, the two subscales met the Rasch optimal criteria. The updated short 22-item scale had good psychometric properties, high item and person separation, and good item and person reliability for the two subscales and can be used as a reliable and valid scale.

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.10.2.957
Pages: 957-973
cloud_download 426
visibility 482
2
Article Metrics
Views
426
Download
482
Citations
Crossref
2

Scopus
2

Construction of the Character Assessment Instrument for 21st Century Students in High Schools

assessment construct character validity reliability

Wiwin Mistiani , Edi Istiyono , Amir Syamsudin


...

The study of character becomes a very important discussion in the 21st century. So that the integration of character values is very important both in the process and in educational assessment. The purpose of this study was to test the validity and reliability of the character assessment instrument for 21st-century high school students. The research approach used was quantitative with a sample of 200 high school students. Data analysis carried out includes validity and reliability tests. The test results of the instrument showed that the construct of the student character assessment instrument was declared valid and reliable. The content validity test shows the value of Aiken's > .80 in the high category. In the construct validation test with EFA, all variables have a loading factor > .5. In the CFA test, the model is declared fit with the estimated standard loading value of .40 and the t-count value > 1.96. Meanwhile, while in testing the reliability of the instrument obtained composite > .70 Cronbach's Alpha reliability > .70 which means reliable. So that this instrument is declared valid and reliable to measure the character of students in high school.

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.11.2.935
Pages: 935-947
cloud_download 481
visibility 746
2
Article Metrics
Views
481
Download
746
Citations
Crossref
2

Scopus
3

...

Visual representations and the process of visualisation have an important role in geometry learning. The optimal use of visual representations in complex multimedia environments has been an important research topic since the end of the last century. For the purpose of the study presented in this paper, we designed a model of learning geometry with the use of digital learning resources like dynamic geometry programmes and applets, which foster visualisation. Students explore geometric concepts through the manipulation of interactive virtual representations. This study aims to explore whether learning of geometry with digital resources is reflected in higher student achievements in solving geometric problems. This study also aims to explore the role of graphical representations (GRs) in solving geometric problems. The results of the survey show a positive impact of the model of teaching on student achievement. In the post-test, students in the experimental group (EG) performed significantly better than students in the control group (CG) in the overall number of points, in solving tasks without GR, in calculating the area and the perimeter of triangles and quadrilaterals than the CG students, in all cases with small size effect. The authors therefore argue for the use of digital technologies and resources in geometry learning, because interactive manipulatives support the transition between representations at the concrete, pictorial and symbolic (abstract) levels and are therefore important for understanding mathematical concepts, as well as for exploring relationships, making precise graphical representations (GRs), formulating and proving assumptions, and applying different problem-solving strategies.

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.11.3.1393
Pages: 1393-1411
cloud_download 2008
visibility 843
6
Article Metrics
Views
2008
Download
843
Citations
Crossref
6

Scopus
6

...

This research is a developmental research aiming at developing a good mathematical test instrument using polytomous responses based on classical and modern theories. This research design uses the Plomp model, which consists of five stages, (1) preliminary investigation, (2) design, (3) realization/construction, (4) revision, and (5) implementation (testing). The study was conducted in three vocational schools in Lampung Province, Indonesia. The study involved 413 students, consisting of 191 male and 222 female students. The data were collected through questionnaire and test. The questionnaire was used to identify the assessment instruments currently employed by teachers and to be validated by the experts of mathematics and educational evaluation. The test used an open polytomous response test numbering of 40 items. The data were analyzed using both classical and modern theories. The results show that (1) the open polytomous response test has a good category according to classical and modern theory. However, the discrimination power of test items in classical theory needs several revisions, (2) the assessment instrument using the polytomous response of open multiple choice can guarantee information on the actual competence of students. This is proven by the fact that there is a harmony between the analysis result obtained from classical and modern theory from the students' arguments when giving reasons for their choices. Therefore, the open polytomous response test can be used as an alternative to learning assessment.

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.11.3.1441
Pages: 1441-1462
cloud_download 352
visibility 520
0
Article Metrics
Views
352
Download
520
Citations
Crossref
0

Scopus
0

Development of a Survey to Assess Conceptual Understanding of Quantum Mechanics among Moroccan Undergraduates

conceptual understanding learning difficulties quantum mechanics teaching/learning

Khalid Ait bentaleb , Saddik Dachraoui , Taoufik Hassouni , El mehdi Alibrahmi , Elmahjoub Chakir , Aimad Belboukhari


...

We developed a Quantum Mechanics Conceptual Understanding Survey (QMCUS) in this study. The survey was conducted using a quantitative methodology. A multiple-choice survey of 35 questions was administered to 338 undergraduate students. Three experienced quantum mechanics instructors examined the validity of the survey. The reliability of our survey was measured using Cronbach's alpha, the Fergusson delta index, the discrimination index, and the point biserial correlation coefficient. These indices showed that the developed survey is reliable. The statistical analysis of the students' results using SPSS shows that the scores obtained by the students have a normal distribution, around the score of 7.14. The results of the t-test show that the students' scores are below the required threshold, which means that it is still difficult for the students to understand the concepts of quantum mechanics. The obtained results allow us to draw some conclusions. The students' difficulties in understanding the quantum concepts are due to the nature of these concepts; they are abstract and counterintuitive. In addition, the learners did not have frequent contact with the subatomic world, which led them to adopt misconceptions. Moreover, students find it difficult to imagine and conceptualize quantum concepts. Therefore, subatomic phenomena are still explained with classical paradigms. Another difficulty is the lack of prerequisites and the difficulties in using the mathematical formalism and its translation into Dirac notation.

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.11.4.2219
Pages: 2219-2243
cloud_download 240
visibility 359
2
Article Metrics
Views
240
Download
359
Citations
Crossref
2

Scopus
1

The Development of a Four-Tier Diagnostic Test Based on Modern Test Theory in Physics Education

developing test four-tiers diagnostic test modern test theory

Edi Istiyono , Wipsar Sunu Brams Dwandaru , Kharisma Fenditasari , Made Rai Suci Shanti Nurani Ayub , Duden Saepuzaman


...

Diagnostic tests are generally two or three-tier and based on classical test theory. In this research, the Four-Tier Diagnostic Test (FTDT) was developed based on modern test theory to determine understanding of physics levels: scientific conception (SC), lack of knowledge (LK), misconception (MSC), false negatives (FN), and false positives (FP). The goals of the FTDT are to (a) find FTDT constructs, (b) test the quality of the FTDT, and (c) describe students' conceptual understanding of physics. The development process was conducted in the planning, testing, and measurement phases. The FTDT consists of four-layer multiple-choice with 100 items tested on 700 high school students in Yogyakarta. According to the partial credit models (PCM), the student's responses are in the form of eight categories of polytomous data. The results of the study show that (a) FTDT is built on the aspects of translation, interpretation, extrapolation, and explanation, with each aspect consisting of 25 items with five anchor items; (b) FTDT is valid with an Aiken's V value in the range of 0.85-0.94, and the items fit PCM with Infit Mean Square (INFIT MNSQ) of 0.77-1.30, item difficulty index of 0.12-0.38, and the reliability coefficient of Cronbach's alpha FTDT is 0.9; (c) the percentage of conceptual understanding of physics from large to small is LK type 2 (LK2), FP, LK type 1 (LK1), FN, LK type 3 (LK3), SC, LK type 4 (LK4), and MSC. The percentage sequence of MSC based on the successive material is momentum, Newton's law, particle dynamics, harmonic motion, work, and energy. In addition, failure to understand the concept sequentially is due to Newton's law, particle dynamics, work and energy, momentum, and harmonic motion.

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.12.1.371
Pages: 371-385
cloud_download 379
visibility 418
0
Article Metrics
Views
379
Download
418
Citations
Crossref
0

Scopus
0

Study Item Parameters of Classical and Modern Theory of Differential Aptitude Test: Is it Comparable?

classical test theory differential aptitude test item parameter modern test theory

Farida Agus Setiawati , Rizki Nor Amelia , Bambang Sumintono , Edi Purwanta


...

This study aimed to find the Classical Test Theory (CTT) and Modern Test Theory (MTT) item parameters of the Differential Aptitude Test (DAT) and examined their comparability of them. The item parameters being studied are difficulty level and discrimination index. 5.024 data of the result sub-test DAT were documented by the Department of Psychology and Guidance and Counselling bureau. The parameter of classical and modern test items was estimated and correlated by examining the comparability between parameters. The results show that there is a significant correlation between item parameter estimates. The Rasch and IRT 1-PL models have the highest correlation toward CTT regarding the item difficulty level. In contrast, model 2-PL has the highest correlation toward CTT in the item discrimination index. Overall, the study concluded that CTT and MTT were comparable in estimating item parameters of DAT and thus could be used independently or complementary in developing DAT.

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.12.2.1097
Pages: 1097-1107
cloud_download 290
visibility 340
2
Article Metrics
Views
290
Download
340
Citations
Crossref
2

Scopus
1

Measurement of Students' Chemistry Practicum Skills Using Many Facets Rash Model

chemistry practicum mfrm performance assessment process assessment product assessment

Melly Elvira , Heri Retnawati , Eli Rohaeti , Syamsir Sainuddin


...

The accuracy of assessing the capabilities of the process and product in chemical practice activities requires appropriate measurement procedures to be followed. It is crucial to identify the components that can introduce bias while measuring student abilities during the measurement process. This study aims to identify the components or criteria used by teachers to assess student performance in practicum activities and analyze the quality of the rubrics developed. The study was conducted with the participation of three raters, 27 high school students, and nine assessment criteria. A quantitative descriptive approach was employed using the many-facet Rasch model (MFRM) analysis for measurement. The results of the MFRM analysis show no significant measurement bias, with data measurement facets fitting the MFRM model. The reliability of all the facets meets the criteria, and the scale predictor functions appropriately. While all students can easily pass four out of nine items, five items can only be partially passed by students. The assessment criteria that require special attention include communication skills, tools and assembly, interpretation, cleanliness, and accuracy when performing practicums. These criteria provide feedback for teachers and students to ensure successful practicum activities. The Discussion section of this study delves into the findings and their implications.

description Abstract
visibility View cloud_download PDF
10.12973/eu-jer.12.3.1297
Pages: 1297-1315
cloud_download 295
visibility 306
0
Article Metrics
Views
295
Download
306
Citations
Crossref
0

Scopus
0

...